尽管深度神经网络(DNNS)在环境感知任务上取得了令人印象深刻的表现,但它们对对抗性扰动的敏感性限制了它们在实际应用中的使用。在本文中,我们(i)提出了一种基于对复杂视觉任务的多任务感知(即深度估计和语义分割)的新型对抗扰动检测方案。具体而言,通过在输入图像的提取边缘,深度输出和分割输出之间的不一致之处检测到对抗性扰动。为了进一步提高这一技术,我们(ii)在所有三种方式之间发展了新颖的边缘一致性损失,从而提高了它们的初始一致性,从而支持我们的检测方案。我们通过采用各种已知攻击和图像噪声来验证检测方案的有效性。此外,我们(iii)开发了多任务对抗攻击,旨在欺骗这两个任务以及我们的检测方案。对城市景观和KITTI数据集的实验评估表明,在假设5%的假阳性率的假设下,最高100%的图像被正确检测为对抗性扰动,具体取决于扰动的强度。代码可在https://github.com/ifnspaml/advattackdet上找到。 https://youtu.be/kka6goywmh4的简短视频可提供定性结果。
translated by 谷歌翻译
自动驾驶车辆中的环境感知通常严重依赖于深度神经网络(DNN),这些神经网络受到域的转移,导致DNN部署期间的性能大大降低。通常,通过无监督的域适应(UDA)方法解决了此问题,同时在源和目标域数据集上训练了训练,甚至仅以离线方式对目标数据进行训练。在这项工作中,我们进一步将无源的UDA方法扩展到了连续的,因此可以在单一图像的基础上进行语义细分。因此,我们的方法仅需要供应商(在源域中训练)和电流(未标记的目标域)相机图像的预训练模型。我们的方法持续batchNorm适应(CBNA)使用目标域图像以无监督的方式修改了批准层中的源域统计信息,从而在推理过程中可以提高稳定的性能。因此,与现有作品相反,我们的方法可以应用于在部署期间不断地以单位图像改进DNN,而无需访问源数据,而无需算法延迟,并且几乎没有计算开销。我们在各种源/目标域设置中显示了我们方法在语义分割中的一致有效性。代码可在https://github.com/ifnspaml/cbna上找到。
translated by 谷歌翻译
在本文中,我们在不依赖于任何源域表示的情况下向“无监督域适应(UDA)的任务”的任务提供了一个解决方案。以前的UDA用于语义细分的方法使用在源域和目标域中的模型的同时训练,或者它们依赖于附加网络,在适应期间将源域知识重放到模型。相比之下,我们介绍了我们的小说无监督的批量适应(UBNA)方法,它将给定的预先训练模型适应未经使用的策略域而不使用 - 超出现有模型参数 - 任何源域表示(既不是数据或者,也可以在在线设置或仅以几滴方式使用从目标域中的几个未标记的图像中应用的。具体地,我们使用指数衰减的动量因子部分地将归一化层统计数据调整到目标域,从而将统计数据与两个域混合。通过评估语义分割的标准UDA基准测试,我们认为这优于一个没有适应的模型以及仅使用目标域中的统计数据的基线方法。与标准UDA方法相比,我们在源域表示的性能和使用之间报告权衡。
translated by 谷歌翻译
Linear partial differential equations (PDEs) are an important, widely applied class of mechanistic models, describing physical processes such as heat transfer, electromagnetism, and wave propagation. In practice, specialized numerical methods based on discretization are used to solve PDEs. They generally use an estimate of the unknown model parameters and, if available, physical measurements for initialization. Such solvers are often embedded into larger scientific models or analyses with a downstream application such that error quantification plays a key role. However, by entirely ignoring parameter and measurement uncertainty, classical PDE solvers may fail to produce consistent estimates of their inherent approximation error. In this work, we approach this problem in a principled fashion by interpreting solving linear PDEs as physics-informed Gaussian process (GP) regression. Our framework is based on a key generalization of a widely-applied theorem for conditioning GPs on a finite number of direct observations to observations made via an arbitrary bounded linear operator. Crucially, this probabilistic viewpoint allows to (1) quantify the inherent discretization error; (2) propagate uncertainty about the model parameters to the solution; and (3) condition on noisy measurements. Demonstrating the strength of this formulation, we prove that it strictly generalizes methods of weighted residuals, a central class of PDE solvers including collocation, finite volume, pseudospectral, and (generalized) Galerkin methods such as finite element and spectral methods. This class can thus be directly equipped with a structured error estimate and the capability to incorporate uncertain model parameters and observations. In summary, our results enable the seamless integration of mechanistic models as modular building blocks into probabilistic models.
translated by 谷歌翻译
For improving short-length codes, we demonstrate that classic decoders can also be used with real-valued, neural encoders, i.e., deep-learning based codeword sequence generators. Here, the classical decoder can be a valuable tool to gain insights into these neural codes and shed light on weaknesses. Specifically, the turbo-autoencoder is a recently developed channel coding scheme where both encoder and decoder are replaced by neural networks. We first show that the limited receptive field of convolutional neural network (CNN)-based codes enables the application of the BCJR algorithm to optimally decode them with feasible computational complexity. These maximum a posteriori (MAP) component decoders then are used to form classical (iterative) turbo decoders for parallel or serially concatenated CNN encoders, offering a close-to-maximum likelihood (ML) decoding of the learned codes. To the best of our knowledge, this is the first time that a classical decoding algorithm is applied to a non-trivial, real-valued neural code. Furthermore, as the BCJR algorithm is fully differentiable, it is possible to train, or fine-tune, the neural encoder in an end-to-end fashion.
translated by 谷歌翻译
Recently, there has been increasing interest in synthesizing data to improve downstream text-to-SQL tasks. In this paper, we first examined the existing synthesized datasets and discovered that state-of-the-art text-to-SQL algorithms did not further improve on popular benchmarks when trained with augmented synthetic data. We observed two shortcomings: illogical synthetic SQL queries from independent column sampling and arbitrary table joins. To address these issues, we propose a novel synthesis framework that incorporates key relationships from schema, imposes strong typing, and conducts schema-distance-weighted column sampling. We also adopt an intermediate representation (IR) for the SQL-to-text task to further improve the quality of the generated natural language questions. When existing powerful semantic parsers are pre-finetuned on our high-quality synthesized data, our experiments show that these models have significant accuracy boosts on popular benchmarks, including new state-of-the-art performance on Spider.
translated by 谷歌翻译
In this paper a global reactive motion planning framework for robotic manipulators in complex dynamic environments is presented. In particular, the circular field predictions (CFP) planner from Becker et al. (2021) is extended to ensure obstacle avoidance of the whole structure of a robotic manipulator. Towards this end, a motion planning framework is developed that leverages global information about promising avoidance directions from arbitrary configuration space motion planners, resulting in improved global trajectories while reactively avoiding dynamic obstacles and decreasing the required computational power. The resulting motion planning framework is tested in multiple simulations with complex and dynamic obstacles and demonstrates great potential compared to existing motion planning approaches.
translated by 谷歌翻译
Whole slide images (WSI) are microscopy images of stained tissue slides routinely prepared for diagnosis and treatment selection in medical practice. WSI are very large (gigapixel size) and complex (made of up to millions of cells). The current state-of-the-art (SoTA) approach to classify WSI subdivides them into tiles, encodes them by pre-trained networks and applies Multiple Instance Learning (MIL) to train for specific downstream tasks. However, annotated datasets are often small, typically a few hundred to a few thousand WSI, which may cause overfitting and underperforming models. Conversely, the number of unannotated WSI is ever increasing, with datasets of tens of thousands (soon to be millions) of images available. While it has been previously proposed to use these unannotated data to identify suitable tile representations by self-supervised learning (SSL), downstream classification tasks still require full supervision because parts of the MIL architecture is not trained during tile level SSL pre-training. Here, we propose a strategy of slide level SSL to leverage the large number of WSI without annotations to infer powerful slide representations. Applying our method to The Cancer-Genome Atlas, one of the most widely used data resources in cancer research (16 TB image data), we are able to downsize the dataset to 23 MB without any loss in predictive power: we show that a linear classifier trained on top of these embeddings maintains or improves previous SoTA performances on various benchmark WSI classification tasks. Finally, we observe that training a classifier on these representations with tiny datasets (e.g. 50 slides) improved performances over SoTA by an average of +6.3 AUC points over all downstream tasks.
translated by 谷歌翻译
We present G-MSM (Graph-based Multi-Shape Matching), a novel unsupervised learning approach for non-rigid shape correspondence. Rather than treating a collection of input poses as an unordered set of samples, we explicitly model the underlying shape data manifold. To this end, we propose an adaptive multi-shape matching architecture that constructs an affinity graph on a given set of training shapes in a self-supervised manner. The key idea is to combine putative, pairwise correspondences by propagating maps along shortest paths in the underlying shape graph. During training, we enforce cycle-consistency between such optimal paths and the pairwise matches which enables our model to learn topology-aware shape priors. We explore different classes of shape graphs and recover specific settings, like template-based matching (star graph) or learnable ranking/sorting (TSP graph), as special cases in our framework. Finally, we demonstrate state-of-the-art performance on several recent shape correspondence benchmarks, including real-world 3D scan meshes with topological noise and challenging inter-class pairs.
translated by 谷歌翻译
In this work we present a fast occupancy map building approach based on the VDB datastructure. Existing log-odds based occupancy mapping systems are often not able to keep up with the high point densities and framerates of modern sensors. Therefore, we suggest a highly optimized approach based on a modern datastructure coming from a computer graphic background. A multithreaded insertion scheme allows occupancy map building at unprecedented speed. Multiple optimizations allow for a customizable tradeoff between runtime and map quality. We first demonstrate the effectiveness of the approach quantitatively on a set of ablation studies and typical benchmark sets, before we practically demonstrate the system using a legged robot and a UAV.
translated by 谷歌翻译